[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116) by wjxiz1992 · Pull Request #14400 · NVIDIA/spark-rapids

wjxiz1992 · 2026-03-11T09:18:33Z

Summary

Root cause: cuDF Scalar.fromDouble(-0.0) normalizes -0.0 to 0.0, losing the IEEE 754 sign bit. This caused scalar subquery results to return 0.0 where CPU returns -0.0, violating GPU-CPU parity.
Fix: In GpuScalar.from(), create float/double scalars via a 1-element ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/Scalar.fromFloat. The column-based path preserves the exact bit pattern.
Recovered test: Removes the .exclude() for "normalize special floating numbers in subquery" in RapidsSQLQuerySuite.

Detail

The GpuScalarSubquery path extracts a JVM Double from the inner query result, then creates a GpuScalar via Scalar.fromDouble(value). cuDF's JNI layer (makeFloat64Scalar) normalizes -0.0 to 0.0 during scalar creation. This was confirmed by direct cuDF API testing:

cudf_scalar=0                    <- Scalar.fromDouble(-0.0) bits=0 (0.0)
cudf_col=0                       <- ColumnVector from that scalar also 0.0
plain_v1=8000000000000000        <- Direct SELECT preserves -0.0
subq_no_join=0                   <- Scalar subquery loses -0.0
join_no_subq=8000000000000000    <- Join without subquery preserves -0.0

The fix replaces Scalar.fromDouble(d) with ColumnVector.fromDoubles(d).getScalarElement(0), which preserves the bit pattern through the host-to-device column creation path.

RAPIDS test to Spark original mapping

RAPIDS test	Spark original	Spark file	Lines
`normalize special floating numbers in subquery` (inherited)	`normalize special floating numbers in subquery`	`sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala`	3620-3636

Test plan

mvn package -pl tests -am -Dbuildver=330 -DwildcardSuites=RapidsSQLQuerySuite -- succeeded 216, failed 0, ignored 17 (previously: succeeded 215, ignored 18)
The previously excluded test "normalize special floating numbers in subquery" now passes
No new test failures introduced

Closes #14116

Made with Cursor

…VIDIA#14116) cuDF's Scalar.fromDouble(-0.0) normalizes -0.0 to 0.0, losing the sign bit. This caused GPU scalar subqueries to return 0.0 where CPU correctly returns -0.0, violating GPU-CPU parity. Root cause: the JNI path Scalar.fromDouble -> makeFloat64Scalar drops the IEEE 754 sign bit of negative zero during scalar creation. Fix: in GpuScalar.from(), create float/double scalars via a 1-element ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/fromFloat. The column-based path preserves the exact bit pattern. This re-enables the previously excluded test "normalize special floating numbers in subquery" in RapidsSQLQuerySuite. Closes NVIDIA#14116 Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor

greptile-apps · 2026-03-11T09:20:46Z

Greptile Summary

This PR fixes a GPU-CPU parity bug where GpuScalar.from() would lose the IEEE 754 sign bit of -0.0 for Float and Double types, because cuDF's Scalar.fromDouble/Scalar.fromFloat normalizes -0.0 to 0.0 at the JNI level.

Key changes:

In GpuScalar.from(), the DoubleType and FloatType branches now detect -0.0 via doubleToRawLongBits/floatToRawIntBits and, when the value is negative zero, create the scalar via a 1-element ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/Scalar.fromFloat. All other values continue to use the cheaper direct scalar path.
The "normalize special floating numbers in subquery" exclusion is removed from RapidsTestSettings, allowing the previously skipped test to run and verify the fix.
Resource management in the new code paths is correct — withResource ensures the temporary ColumnVector is closed before the Scalar is returned.

Confidence Score: 4/5

This PR is safe to merge; the fix is narrowly scoped, logically correct, and validated by a recovered test.
The -0.0 detection logic using raw bit comparison is correct, the withResource usage prevents resource leaks, and the change is confined to a well-understood special case. Score is 4 rather than 5 only because of minor code duplication across the three case branches and the fact that analogous NaN-payload normalization issues (if they exist in cuDF) would not be covered by this fix.
No files require special attention; both changed files are straightforward.

Important Files Changed

Filename	Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala	Adds a -0.0 special-case for Double and Float scalar creation: when the value is negative zero, a 1-element ColumnVector is used instead of Scalar.fromDouble/Scalar.fromFloat to preserve the IEEE 754 sign bit. Logic and resource management are correct; minor code duplication across three case branches.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala	Removes the `.exclude("normalize special floating numbers in subquery", ...)` entry, re-enabling the previously skipped test now that the underlying -0.0 scalar bug is fixed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["GpuScalar.from(v, dataType)"] --> B{dataType}
    B --> |DoubleType| C{v type?}
    B --> |FloatType| G{v type?}

    C --> |"Double d"| D{"is negative zero?"}
    C --> |"Float f"| E["d = f.toDouble"] --> D

    D --> |yes| F["ColumnVector.fromDoubles(d) then getScalarElement(0) - preserves sign bit"]
    D --> |no| N["Scalar.fromDouble(d) - fast path"]

    G --> |"Float f"| H{"is negative zero?"}
    H --> |yes| I["ColumnVector.fromFloats(f) then getScalarElement(0) - preserves sign bit"]
    H --> |no| J["Scalar.fromFloat(f) - fast path"]

    F --> K[Return Scalar]
    N --> K
    I --> K
    J --> K

Comments Outside Diff (2)

sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala, line 282-344 (link)

Code duplication and simplified -0.0 detection

The fix has duplicated withResource(ColumnVector.fromDoubles(d)) { cv => cv.getScalarElement(0) } blocks in three separate places (double/double, double/float, float/float). Additionally, the condition JDouble.doubleToRawLongBits(d) == JDouble.doubleToRawLongBits(-0.0d) re-computes doubleToRawLongBits(-0.0d) at each call site instead of using a constant. The check can be simplified to JDouble.doubleToRawLongBits(d) == 0x8000000000000000L (since -0.0d always has that bit pattern), and the duplicate blocks can be extracted into a helper. This is a style concern, not a correctness issue.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala, line 292-300 (link)

Float-to-Double -0.0 can bypass the column-vector path

When a Float is provided for DoubleType, the code does val d = f.toDouble and then checks doubleToRawLongBits(d) == doubleToRawLongBits(-0.0d). IEEE 754 mandates that -0.0f.toDouble == -0.0d, so the check works correctly today. However, if cuDF ever also normalizes other Float-specific NaN payloads during float→double widening through ColumnVector, this path would still not catch them. This is not a bug for the current fix, but worth noting as a future-proofing consideration.

_{Last reviewed commit: dbc46f0}

Copilot

Pull request overview

Fixes GPU/CPU parity for scalar subquery results involving signed zero by ensuring GpuScalar.from preserves the IEEE-754 -0.0 bit pattern for float/double values, and re-enables the previously excluded Spark-derived test that covers this case.

Changes:

Update GpuScalar.from to create float/double cuDF scalars via a 1-element ColumnVector + getScalarElement(0) to preserve -0.0.
Remove the exclusion for "normalize special floating numbers in subquery" from Spark 3.3.0 test settings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala`	Changes float/double scalar creation path to preserve `-0.0` bit pattern.
`tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala`	Re-enables the previously excluded subquery normalization test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T09:23:02Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala

+        case d: Double =>
+          // cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
+          // Create via a 1-element column to preserve the exact bit pattern.
+          withResource(ColumnVector.fromDoubles(d)) { cv =>
+            cv.getScalarElement(0)
+          }
+        case f: Float =>
+          withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
+            cv.getScalarElement(0)
+          }
        case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
          s" for DoubleType, expecting Double or Float.")


Creating float/double scalars by materializing a 1-element ColumnVector will allocate/copy device memory on every scalar creation, which can be a noticeable perf/regression vs Scalar.fromDouble/fromFloat (GpuScalar.from is used in many code paths). Consider using the ColumnVector+getScalarElement workaround only for the specific problematic value (signed zero), and keep the fast Scalar.fromDouble path for all other doubles/floats.

Suggested change

case d: Double =>

// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).

// Create via a 1-element column to preserve the exact bit pattern.

withResource(ColumnVector.fromDoubles(d)) { cv =>

cv.getScalarElement(0)

}

case f: Float =>

withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>

cv.getScalarElement(0)

}

case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +

s" for DoubleType, expecting Double or Float.")

case d: Double =>

// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).

// Use a 1-element column only for negative zero to preserve the exact bit pattern.

if (d == 0.0 && JDouble.doubleToRawLongBits(d) == JDouble.doubleToRawLongBits(-0.0d)) {

withResource(ColumnVector.fromDoubles(d)) { cv =>

cv.getScalarElement(0)

}

} else {

Scalar.fromDouble(d)

}

case f: Float =>

// Preserve negative zero for floats as well.

if (f == 0.0f && JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) {

withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>

cv.getScalarElement(0)

}

} else {

Scalar.fromDouble(f.toDouble)

}

case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +

s" for DoubleType, expecting Double or Float.")

Copilot · 2026-03-11T09:23:02Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala

+        // Create via a 1-element column to preserve the exact bit pattern.
+        withResource(ColumnVector.fromFloats(f)) { cv =>
+          cv.getScalarElement(0)


Same perf concern as the DoubleType branch: using ColumnVector.fromFloats(...).getScalarElement(0) for every float scalar creation adds extra allocations/copies. If the only correctness issue is preserving -0.0f, consider detecting signed zero (via raw bits) and only using the column-based workaround in that case, otherwise use Scalar.fromFloat.

Suggested change

// Create via a 1-element column to preserve the exact bit pattern.

withResource(ColumnVector.fromFloats(f)) { cv =>

cv.getScalarElement(0)

// For -0.0f, create via a 1-element column to preserve the exact bit pattern.

if (JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) {

withResource(ColumnVector.fromFloats(f)) { cv =>

cv.getScalarElement(0)

}

} else {

Scalar.fromFloat(f)

revans2

Why? I get that Spark has a test for this, but what value is there is keeping -0.0 instead of normalizing it to 0.0? IEEE treats them all as the same, and spark has had bugs where -0.0 can cause issues. Does creating a column so that we can pull out a scalar from it cost any performance? It will cost some memory at least. Do we know where cudf is normalizing this? I just want questions answered before we put in a change like this.

Address review feedback: instead of routing all float/double scalar creation through ColumnVector (which allocates device memory), detect -0.0 via raw bit comparison and only use the slow path for that specific value. All other values continue to use the fast Scalar.fromDouble/ Scalar.fromFloat path, making the common-case cost zero. Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2026-03-12T06:44:56Z

@revans2 Let me add more background on the UT fix work:

we want to increase test coverage
method is: port(by extends) Spark Unit Tests directly via rapidsTest framework to run them on GPU.
We want GPU to produce exact same behavior as CPU does. For this case, though we know 0.0 == -0.0, but if CPU produces -0.0, GPU should also produce -0.0 . Another check in Spark is this compare, it intentionally distinguishes them.
sign bit loss at Java_ai_rapids_cudf_Scalar_makeFloat64Scalar.

The debug code used in the suite was:

testRapids("DEBUG #14116 - trace negative zero through join") {
  val negZeroRow = java.util.Arrays.asList(Row(-0.0))
  val posZeroRow = java.util.Arrays.asList(Row(0.0))
  val schema = org.apache.spark.sql.types.StructType(Seq(
    org.apache.spark.sql.types.StructField(
      "d", org.apache.spark.sql.types.DoubleType)))
  withTempView("v1", "v2") {
    spark.createDataFrame(negZeroRow, schema)
      .createTempView("v1")
    spark.createDataFrame(posZeroRow, schema)
      .createTempView("v2")

    def bits(d: Double): String =
      java.lang.Double.doubleToRawLongBits(d).toHexString

    val sb = new StringBuilder
    import com.nvidia.spark.rapids.{GpuScalar, Arm}
    Arm.withResource(
      GpuScalar((-0.0).asInstanceOf[Any],
        org.apache.spark.sql.types.DoubleType)
    ) { gs =>
      val s = gs.getBase
      sb.append(s"gpuscalar=${bits(s.getDouble)} ")
    }

    val tests = Seq(
      "plain_v1" -> "SELECT d FROM v1",
      "subq_no_join" -> "SELECT (SELECT d FROM v1)",
      "join_no_subq" ->
        "SELECT v1.d FROM v1 JOIN v2 ON v1.d = v2.d",
      "subq_with_join" -> ("SELECT (SELECT v1.d " +
        "FROM v1 JOIN v2 ON v1.d = v2.d)")
    )
    tests.foreach { case (name, query) =>
      val d = sql(query).collect().head.getDouble(0)
      sb.append(s"$name=${bits(d)} ")
    }

    val msg = sb.toString
    assert(msg.contains("gpuscalar=8000000000000000"),
      s"GpuScalar lost -0.0. All results: $msg")
    assert(msg.contains("subq_no_join=8000000000000000"),
      s"subq_no_join lost -0.0. All results: $msg")
    assert(msg.contains("subq_with_join=8000000000000000"),
      s"subq_with_join lost -0.0. All results: $msg")
  }
}

Test output was:

cudf_scalar=0 cudf_col=0 plain_v1=8000000000000000 subq_no_join=0 join_no_subq=8000000000000000 subq_with_join=0

Test	Bits	Meaning
`cudf_scalar`	`0`	`Scalar.fromDouble(-0.0)` → 0.0, sign bit lost
`plain_v1`	`8000000000000000`	Direct SELECT via column path, -0.0 preserved
`subq_no_join`	`0`	Scalar subquery (no join), goes through `GpuScalar` → sign bit lost
`join_no_subq`	`8000000000000000`	Join without scalar subquery, -0.0 preserved
`subq_with_join`	`0`	Scalar subquery + join, same `GpuScalar` path → sign bit lost

==================================

A more dedicated test:

NegativeZeroScalarRepro.java

import ai.rapids.cudf.ColumnVector;
import ai.rapids.cudf.Scalar;

/**
 * Standalone reproducer: cuDF Scalar.fromDouble(-0.0) loses the IEEE 754 sign bit.
 *
 * Run with:
 *   javac -cp <rapids-uber-jar> NegativeZeroScalarRepro.java
 *   java  -cp .:<rapids-uber-jar> NegativeZeroScalarRepro
 *
 * Expected: Scalar path should preserve -0.0 (sign bit = 0x8000000000000000)
 * Actual:   Scalar path normalizes -0.0 to 0.0 (sign bit lost)
 *
 * See: https://github.com/NVIDIA/spark-rapids/issues/14116
 */
public class NegativeZeroScalarRepro {
  public static void main(String[] args) {
    System.out.println("=== cuDF Scalar -0.0 sign bit reproducer ===\n");
    int failures = 0;

    // --- Double tests ---
    System.out.println("--- Double (-0.0d) ---");

    // Path 1: Scalar.fromDouble — this is the buggy path
    try (Scalar s = Scalar.fromDouble(-0.0d)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean preserved = bits == 0x8000000000000000L;
      System.out.printf("  Scalar.fromDouble(-0.0):        bits=0x%016x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // Path 2: ColumnVector — this preserves the bit pattern
    try (ColumnVector cv = ColumnVector.fromDoubles(-0.0d);
         Scalar s = cv.getScalarElement(0)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean preserved = bits == 0x8000000000000000L;
      System.out.printf("  ColumnVector(-0.0).getScalar:   bits=0x%016x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // --- Float tests ---
    System.out.println("\n--- Float (-0.0f) ---");

    // Path 1: Scalar.fromFloat
    try (Scalar s = Scalar.fromFloat(-0.0f)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean preserved = bits == 0x80000000;
      System.out.printf("  Scalar.fromFloat(-0.0f):        bits=0x%08x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // Path 2: ColumnVector
    try (ColumnVector cv = ColumnVector.fromFloats(-0.0f);
         Scalar s = cv.getScalarElement(0)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean preserved = bits == 0x80000000;
      System.out.printf("  ColumnVector(-0.0f).getScalar:  bits=0x%08x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // --- Positive zero sanity check ---
    System.out.println("\n--- Positive zero (sanity check) ---");
    try (Scalar s = Scalar.fromDouble(0.0d)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean ok = bits == 0L;
      System.out.printf("  Scalar.fromDouble(0.0):         bits=0x%016x  %s%n",
          bits, ok ? "PASS" : "FAIL");
      if (!ok) failures++;
    }
    try (Scalar s = Scalar.fromFloat(0.0f)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean ok = bits == 0;
      System.out.printf("  Scalar.fromFloat(0.0f):         bits=0x%08x  %s%n",
          bits, ok ? "PASS" : "FAIL");
      if (!ok) failures++;
    }

    // --- User-visible impact ---
    System.out.println("\n--- User-visible impact (string representation) ---");
    System.out.printf("  Java Double.toString(-0.0): \"%s\"%n", Double.toString(-0.0));
    System.out.printf("  Java Double.toString(0.0):  \"%s\"%n", Double.toString(0.0));
    System.out.println("  => cast(-0.0 as string) would return \"0.0\" on GPU"
        + " instead of \"-0.0\" on CPU");

    System.out.printf("%n=== Result: %d failure(s) ===%n", failures);
    System.exit(failures > 0 ? 1 : 0);
  }
}

Repro:

// first compile
javac -cp <PATH_TO>/rapids-4-spark_2.12-26.04.0-SNAPSHOT-cuda12.jar NegativeZeroScalarRepro.java
...
// run
RAPIDS_JAR=<PATH_TO>/rapids-4-spark_2.12-26.04.0-SNAPSHOT-cuda12.jar && SLF4J_JAR=<PATH_TO>/org/slf4j/slf4j-api/1.7.36/slf4j-api-1.7.36.jar &&  java -cp .:${RAPIDS_JAR}:${SLF4J_JAR} NegativeZeroScalarRepro

==================================================

=== cuDF Scalar -0.0 sign bit reproducer ===

--- Double (-0.0d) ---
  Scalar.fromDouble(-0.0):        bits=0x0000000000000000  FAIL (sign bit lost)
  ColumnVector(-0.0).getScalar:   bits=0x8000000000000000  PASS (sign bit preserved)

--- Float (-0.0f) ---
  Scalar.fromFloat(-0.0f):        bits=0x00000000  FAIL (sign bit lost)
  ColumnVector(-0.0f).getScalar:  bits=0x80000000  PASS (sign bit preserved)

--- Positive zero (sanity check) ---
  Scalar.fromDouble(0.0):         bits=0x0000000000000000  PASS
  Scalar.fromFloat(0.0f):         bits=0x00000000  PASS

--- User-visible impact (string representation) ---
  Java Double.toString(-0.0): "-0.0"
  Java Double.toString(0.0):  "0.0"
  => cast(-0.0 as string) would return "0.0" on GPU instead of "-0.0" on CPU

=== Result: 2 failure(s) ===
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

I didn't file issue to cuDF as this 0.0 issue in this use case is more specific to Spark, but I can file one if you think it's needed.

I update the logic to only ColumnVector workaround for -0.0 for performance. All other values continue to use the fast Scalar.fromDouble/Scalar.fromFloat path.

revans2 · 2026-03-12T12:12:31Z

@wjxiz1992 I think you might have misunderstood my comments. -0.0 vs 0.0 is a very minor thing. It feels very inconsequential to have a scalar value be different. I get that we want to match all of spark's unit tests, but if it is going to make the code less maintainable and possibly slower, then we need to think about the cost of being 100% compatible with spark and decide if it is worth it. We also need to think about the change that we are making and ask ourselves if this is the right place to make the change. Here we are making multiple tiny memory allocations and copies to make it so that we can preserve the -0.0. That is not worth it to me, which is why I asked where the error came from. We are not explicitly normalizing -0.0 to 0.0 in the code you pointed to, so I want to understand if there is something we can do there that would let us fix this in a much more efficient way. I also want to understand who added the test and why that added the test. What possible case exists that we need to preserve this, especially when spark strips it out in so many situations.

For me I am +1 if we can do this in a way that does not cause any more GPU memory allocations or GPU data movement, or make the code more difficult to work with. I am -1 in all other cases unless we can prove that this is critical to some real world situation.

wjxiz1992 · 2026-03-13T03:46:34Z

@revans2 I see, thanks for the nice explanation!

Now I changed my wrong thought that "correctness(consistency with Spark) > performance". it really depends.
Yes, the JNI code deson't explicitly normalize it (sign bit loss is just a result). makeFloat64Scalar creates a cudf::scalar_type_t and calls set_value(static_cast(value)). No explicit normalization here — the jdouble → double cast preserves the sign bit. Then cuDF C++ layer — cudf/cpp/src/scalar/scalar.cpp
fixed_width_scalar::set_value calls _data.set_value_async(value, stream) where _data is an rmm::device_scalar. This performs a host-to-device copy. The sign bit is lost somewhere in this RMM path — neither the JNI nor the cuDF C++ code explicitly normalizes -0.0.
Let me file a cuDF issue there for more investigations and discussions, this tiny unit test doesn't deserve the added GPU memory costs introduced in this PR.
All [AutpSparkUT] prefix PRs actually originates from the test coverage worry from high table, then Mahone introduced this "RapidsTest-extends" way to migrate all Spark unit test. And no, it's not from any actual customer use case, you can regard it as just a KPI "how many Spark unit tests have been migrated". All problematic tests are now excluded in this file, and my job is to remove them from exlucsion. I'll be more cautious so it doens't ruin the spark-rapids project.

wjxiz1992 · 2026-03-13T05:38:55Z

@revans2 Update: I traced the root cause down to the RMM layer.

Root cause: rmm::device_uvector::set_element_async (device_uvector.hpp L222-226) has a zero-optimization that uses cudaMemsetAsync when value == value_type{0}. Since IEEE 754 defines -0.0 == 0.0 as true, -0.0 triggers the memset path, which clears all bits including the sign bit.

cudaMemcpy(-0.0):   bits=0x8000000000000000  PASS (sign bit preserved)
cudaMemset(0):      bits=0x0000000000000000  FAIL (sign bit lost)
-0.0 == 0.0 ? TRUE (IEEE 754)

The call chain is: Scalar.fromDouble → JNI makeFloat64Scalar → cudf::fixed_width_scalar::set_value → rmm::device_scalar::set_value_async → device_uvector::set_element_async → hits the == 0 optimization → cudaMemsetAsync → sign bit gone.

Filed as rapidsai/rmm#2298 with a CUDA reproducer and suggested fix (use memcmp instead of == for floating-point zero detection).

For this PR: since the proper fix belongs in RMM (zero GPU memory overhead once fixed there), I'll keep the exclude in place and revisit after the RMM fix lands.

Copilot AI review requested due to automatic review settings March 11, 2026 09:18

Copilot started reviewing on behalf of wjxiz1992 March 11, 2026 09:19 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

revans2 requested changes Mar 11, 2026

View reviewed changes

wjxiz1992 self-assigned this Mar 12, 2026

wjxiz1992 mentioned this pull request Mar 13, 2026

device_uvector::set_element_async loses IEEE 754 sign bit of -0.0 due to zero-optimization rapidsai/rmm#2298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116)#14400

[AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116)#14400
wjxiz1992 wants to merge 2 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14116-normalize-float-subquery

wjxiz1992 commented Mar 11, 2026

Uh oh!

greptile-apps bot commented Mar 11, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

revans2 left a comment

Uh oh!

wjxiz1992 commented Mar 12, 2026 •

edited

Loading

Uh oh!

revans2 commented Mar 12, 2026

Uh oh!

wjxiz1992 commented Mar 13, 2026 •

edited

Loading

Uh oh!

wjxiz1992 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wjxiz1992 commented Mar 11, 2026

Summary

Detail

RAPIDS test to Spark original mapping

Test plan

Uh oh!

greptile-apps bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

revans2 left a comment

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

revans2 commented Mar 12, 2026

Uh oh!

wjxiz1992 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjxiz1992 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Mar 11, 2026 •

edited

Loading

wjxiz1992 commented Mar 12, 2026 •

edited

Loading

wjxiz1992 commented Mar 13, 2026 •

edited

Loading